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Abstract — An important problem in bioinformatics is the inference of 
gene regulatory networks (GRN) from temporal expression profiles. In 
general, the main limitations faced by GRN inference methods is the 
small number of samples with huge dimensionalities and the noisy 
nature of the expression measurements. In face of these limitations, 
alternatives are needed to get better accuracy on the GRNs inference 
problem. This work addresses this problem by presenting an alternative 
feature selection method that applies prior knowledge on its search 
strategy, called SFFS-BA. The proposed search strategy is based 
on the Sequential Floating Forward Selection (SFFS) algorithm, with 
the inclusion of a scale-free (Barabasi-Albert) topology information in 
order to guide the search process to improve inference. The proposed 
algorithm explores the scale-free property by pruning the search space 
and using a power law as a weight for reducing it. In this way, the 
search space traversed by the SFFS-BA method combines a breadth- 
first search when the number of combinations is small ({k) < 2) with a 
depth-first search when the number of combinations becomes explosive 
({k) > 3), being guided by the scale-free prior information. Experimental 
results show that the SFFS-BA provides a better inference similarities 
than SFS and SFFS, keeping the robustness of the SFS and SFFS 
methods, thus presenting very good results. 

Keywords: SFS, SFFS, feature selection, reverse-engineering, gene 
networks inference, systems biology, bioinformatics. 

1 Introduction 

One of the most challenging research problems for System 
Biology nowadays is the inference (or reverse-engineering) of 
gene regulatory networks (GRNs) from expression profiles. 
This research issue became important after the debioinformat- 
icsvelopment of high-throughput technologies for extraction 
of gene expressions (mRNA abundances or transcripts), such 
as DNA microarrays Q or SAGE |2|, and more recently 
RNA-Seq [Sj. This problem regards revealing regulatory re- 
lationships between biological molecules in order to recover 
a complex network of interrelationships, which can describe 
not just diverse biological functions but also the dynamics of 
molecular activities. It is very important to understand how 
many biological processes happen and in most cases, how to 
prevent it from happening (diseases). 
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In the context of expression profiles, a big challenge that 
researchers need to face is the large number of variables 
or genes (thousands) for just a few experiments available 
(dozens). In order to infer relationships among those variables, 
it is needed a great effort in developing novel computational 
and statistical techniques that are able to alleviate the intrinsic 
error estimation committed in the presence of small number 
of samples with huge dimensionalities. 

In general, it is not possible to recover the GRNs very 
accurately. The main reasons for this are the lack of informa- 
tion about the biological organism, the high complexity of the 
networks and the intrinsic noise of the expression measure- 
ments. In this context, there are several recent initiatives to 
overcome such limitations by incorporating other information 
on the inference/prediction method. One kind of initiative is 
the use of the functional gene information, e.g., from the Gene 
Ontology, Proteome, KEGG, among others, into the clustering 
process, resulting in more biologically meaningful clusters |4|, 
0, (61. Another alternative is by using biological information 
for the discovery of transcriptional regulation relationships, 
i.e., to infer GRNs fT], fS], |9|. A variety of biological data 
integration techniques for GRNs inference are described in 

ifToii. ifTTii. ca, ca. 

Although the integration of biological information with 
mathematical models is critically important in discovering 
novel biological knowledge, it is restricted by the prior biolog- 
ical information of each gene or biological entity. One way to 
use prior information and make the methods less restrictive is 
the use of information about local or global prior knowledge of 
an organism instead of an information about a single gene, e.g., 
to use the network structure/topology as prior information. In 
this way, the integrated use of multiple data types together 
with local and global topological properties could be decisive 
for the effective prediction of GRNs and their functions in face 
of the known limitations ifTOl. |[T4ll. ifBl. l[T6ll. ifTTl. 

The analysis of local and global biological network prop- 
erties and its application on the inference process is very 
recent and promising liTSl . |[T9l , ifTTIl . 1201 . 1211 . For example, 
the application of network structure by the inference methods 
makes use of similarities of connected network modules [ 22l . 
structural and graph-theoretic interpretation for the network 
components li23iL [il7iL taking into account the network sparse- 
ness 1241 . 1251 . 1261 . gene network motifs 1271 and the search 
for cliques in network graphs 1281 . More important, biological 



networks and particularly GRNs are known not only to be 
sparse, but also organized, so as nodes belonging to different 
connectivity classes |29|. These examples show the importance 
of such problem and the need for new methodologies to 
overcome it. 

The information about network topology can help the inves- 
tigation of biological process by adopting the complex network 
theory and its properties fSOl, |[3T|, 1321 , 133 1. It is known 
that many kinds of relationships can be successfully described 
by complex networks. In particular, the complex networks 
theory describes various types of network topologies, each one 
with well defined properties, and it has been broadly applied 
to characterize biological processes and gene relationships 
involved. Some biological networks, for example, were shown 
to present the scale-free property, in which many nodes have 
a low degree and a few of them have a high degree (hubs), in 
which the degree distribution is approximated by a power-law 
distribution (31, El, EH, El, EH, ES, El- In general, 
topological patterns and its structural analysis is one of the 
most promising topics in the analysis of complex networks 
flO|, 1 41 1, and particularly the application of structural prop- 
erties of the network can be a very valuable prior information 
to be used by the GRNs inference methods. 

The use of network topology models for the simulation and 
analysis of GRNs have been recently described in 1421 and 
the same models are further explored in the present paper. 
In this work, a new method is proposed for the inference 
of GRNs from expression profiles by incorporating a scale- 
free topology and applying a conditional dependency criterion 
function. Thus, it is suggested a method that takes the scale- 
free topology into account as prior information in the inference 
process. This leads to a better inference of networks presenting 
scale-free property. The main purpose of this paper is to show 
the interest of taking into account information about the scale- 
free topology of the network in order to improve the inference 
process and make it more suitable for a class of problems, i.e., 
scale-free networks. 

The inference process is conducted by observing the condi- 
tional dependence of a target gene given its potential predictors 
through temporal expression profiles, and by applying the 
mean conditional entropy as criterion function 1431 . B4l . B31 . 
1461 . This process has been recognized as an appropriate 
statistical tool to model direct interactions between genes 1291 . 

The main contribution of this work is the proposal of 
a new feature selection method for GRNs inference from 
temporal expression profiles. Our inference method is based 
on a previous feature selection algorithm 1471 . BSl . with 
the inclusion of scale-free topology as a prior information, 
in which the search space traversed is relatively small and 
provides encouraging results. 

Next sections (Sections [2] and |3]) introduce a brief back- 
ground on the complex network theory and the network 
inference problem. In Section [4j the feature selection problem 
is discussed in more detail, including a short description 



optimal. Section [6] describes our proposed feature selection 
method (SFFS-BA). Section [7] shows some experimental re- 
sults. Finally, Section [8] concludes the work, discussing future 
perspectives. 

2 Complex Networks Theory and Bio- 
logical Networks 

The genes of a network can be characterized by its degree, i.e., 
the number of connections with other genes of these network 
that it has. By considering directed networks, there are two 
kinds distinct relationships. The in-degree is the number of 
directed connections received by a gene. The out-degree is 
the number of directed connections edges sent by a gene. 
The individual gene degrees can be used to estimate the 
degree distribution P{k) and as a result, characterize the whole 
network, i.e., a global network property. 

The uniformly-random Erdos-Renyi (ER) B9l complex 
network model is based on randomly connected vertices. 
This model assume the hypothesis that complex systems are 
connected at random, leading to a Poisson degree distribution 
with peak at the average degree {k), indicating that the most 
of the genes have a degree close to {k). In other works, the 
ER model has a statistically homogeneous degree distribution 

EH. 

On the other hand, the scale-free network structure proposed 
by Barabasi and Albert |50| (BA), is based on a heterogeneous 
distribution on its vertex degree, in which few genes have 
a large number of connections and the most of genes have 
few connections. The absence of a typical degree led to this 
complex network model to be described as "scale-free" 1381 . 
More specifically, the scale-free structural property is charac- 
terized by a power-law in its connections (degree) distribution. 
In other words, the probability P{k) of a gene to interact with 
k other genes decays as a power law 



P{k) - k-\ 



(1) 



of the SFS and SEES techniques. Section |4.3| discusses the 
intrinsically multivariate prediction issue and how it can affect 
the greedy feature selection algorithms in such way that 
the achieved solution may be relatively far away from the 



in which 7 is a numerical constant. 

In general, numerous networks, such as the Internet, hu- 
man collaboration networks and metabolic networks, follow a 
scale-free structure |38|. In particular, most known biological 
networks present a scale-free structure 1331 . implying that their 
distribution on its gene relationships (degree k) is irregular, 
a large number of connections (edges) is concentrated on a 
small number of genes, while large number of genes have 
few connections. In this case, scale-free networks have a high 
probability of exhibiting hubs |50|. 

In particular, by considering the transcriptional regulations 
of the Saccharomyces cerevisiae l35l . it was found a close 
relation between proteins and genes which presents a degree 
distribution with an exponential decay very similar to a power 
law. In the work developed by Earkas et ah |36|, it was found 
an overlap between the connectivity distribution of scale- 
free and Saccharomyces cerevisiae transcriptome networks. In 
such work it was suggested a potential regulatory relationship 
among its genes, in which a small number of transcription 
factors are responsible for a complex set of expression patterns 
under diverse conditions. 



Regarding the constant decay 7, it was reported that scale- 
free networks describe the Escherichia coli metaboHc net- 
works and its metaboHc reactions follows a power-law, with 
7= 2.2 I34I . It is also known that the probability of a given 
yeast protein to interact with k other yeast proteins follows 
a power-law, with 7= 2.4 [^5T|, |[5?|. In general, the degree 
exponent 7 is usually in the range 2 < 7 < 3 ISTIl . ISSll . In 
summary, the scale-free complex network model has been 
effectively used to simulate and describe the behavior of 
biological networks ESl, El, ES, EH, E21, El, El- 



theory based criterion functions are used to detect 1-to-l and 
N-to-1 relationships E3, (63, (SI, im, (63, relying on 
the uniformity of the conditional probability distributions of 
the target given the candidate predictors as a whole (larger 
uniformity leads to higher entropy, which in turn leads to 
smaller mutual information). 

The literature related to modeling and identification of 
GRNs is huge and on rapidly increasing, which indicates its 
importance. The reader is referred to iTTTIl . ifTSll . 1661 , 1671 , 
l68l , l69l for reviews on this subject. 



3 GRN Inference 

The combination of expression analysis, perturbations, treat- 
ments and gene mutations may indicate information about 
molecular or specific functions of the genes. Gene regulatory 
networks (GRN) inference from expression data, a process 
also known as reverse engineering, is a difficult computational 
task due to the huge data volume (number of genes or expres- 
sion profiles) and to the small number of available samples, 
including the large complexity of biological networks, thus 
representing an important challenge in bioinformatics and 
computational biology researches l53l . 

The GRNs inference from temporal expression data tries 
to identify the variation of the expression levels along time, 
becoming possible to indicate information such as metabolic 
pathways, cell cycle and mapping of modifications caused by 
stimuli. It can be used as a model for functional representation 
of gene interactions. 

It is important to highlight that the network inference has the 
objective of discovering interaction networks between genes 
that are potentially interesting from the biological point of 
view. The relationships among genes are suggested according 
to some estimator and can be examined or validated by wet-lab 
experiments. As such experiments have a high cost in terms 
of financial, human and time resources, the main idea is to 
offer to the specialists a reduced number of hypotheses that 
satisfactorily identify a certain phenomenon of interest. 

There are several approaches for modeling and identification 
of GRNs. Examples include Boolean Networks l54l and its 
stochastic version (Probabilistic Boolean Networks) |55|, Dif- 
ferential Equations |56| and Bayesian Networks l57l , to name 
but a few. This work focuses on the Probabilistic Boolean 
Networks (PBN) model, since it captures global properties 
of GRNs while dealing well with settings presenting limited 
number of samples. 

Regarding the feature selection approaches to infer GRNs, 
there are mainly three types of criterion functions. The cor- 
relation based criterion functions are those that measure 1-to- 
1 relationships, often employed to identify co-regulation be- 
tween genes, functional modules and clusters |58|. It does not 
take into account multivariate relationships, i.e., the expression 
of a given target being regulated by a set of two or more 
genes with multivariate interaction. Bayesian error estimation 
based criterion functions evaluate the estimated errors present 
in the joint probability distribution of a target gene given its 
candidate predictor genes l59l , l6Ql . l6Tl , which is capable to 
detect multivariate (N-to-1) relationships. Finally, information 



4 Feature Selection 

Pattern recognition methods allow the classification of objects, 
i.e., a class or label is assigned to each object based on 
its features. In many applications, and specifically in GRN 
inference, the feature space dimension of such objects tends 
to be very high while the number of samples is very limited. 
In this context, the study and development of techniques for 
dimensionality reduction become mandatory. 

Feature selection is a possible approach to perform dimen- 
sionality reduction. A feature selection method looks for sub- 
sets of features that lead to a good representation, classification 
or prediction of the objects classes. It is composed by two 
main parts: a search algorithm and a criterion function which 
assigns a quality value to the feature subsets. 

The only way to guarantee optimality of the solution is 
to investigate the entire space of possible subsets (exhaustive 
search), although depending on some criterion function con- 
straints adopted (e.g. monotonical or U-shaped), it is possible 
to reach the optimal solution by searching for a constrained 
subset space by applying "branch- and-bound" methods ITOl , 
|71|. The exhaustive search is computationally unfeasible in 
general, and especially for inference of GRNs which involves 
data with thousands of features (genes). There are many 
heuristics proposed for feature selection. Two classical feature 
selection heuristics are briefly introduced and discussed below. 

4.1 Sequential Forward Selection (SFS) 

The Sequential Forward Selection (SFS) is a genuinely greedy 
feature selection algorithm in the sense that it includes the 
best feature according to the criterion function in each step. 
It starts with an empty set (bottom-up approach) and adds 
the best individual feature to the partial solution. In the next 
step, it looks for the best feature that, jointly with the feature 
already included in the partial solution, forms the best pair. 
This process continues until a stop condition is satisfied, which 
normally is based on a fixed dimension value (cardinality 
of the subset to be returned) or based on the variation of 
the criterion function from the previous to the next step. A 
variant of this algorithm is the Sequential Backward Selection 
(SBS) which starts with the full feature set and eliminates the 
least relevant feature according to the criterion function (top- 
down approach), repeating this process successively until a 
stop condition is satisfied |47|. 

The greedy algorithms such as SFS and SBS present a 
drawback known as nesting effect. This effect occurs because 
the discarded features by using the top-down approach are 



never inserted again to the partial solution, and the inserted 
features in the bottom-up approach are never discarded from 



the partial solution. Section 4.3 presents the reason why this 
phenomenon occurs. 

4.2 Sequential Floating Forward Selection (SFFS) 

The Sequential Floating Forward Selection (SFFS) algorithm 
tries to alleviate the nesting effect by allowing the inclusion 
and exclusion of features in the partial solution in a floating 
way, i.e., without requiring the definition of the number of 
insertions or exclusions l47l . It starts with an empty set 
(cardinality ^ = 0). The SFS algorithm is applied until k = 2. 
For k > 2, the SBS algorithm is applied in order to exclude 
bad features. The SFFS applies alternately the SFS and SBS 
until a stop condition is reached. The best solution of each 
cardinality k is stored in a list. The best solution among them 
is selected as the returned solution of the algorithm and, in case 
of ties, the solution with the smallest cardinality is returned. 
A schematic flowchart of the SFFS algorithm is presented in 
Figure [T] 
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Fig. 1. Schematic fiowciiart of tlie SFFS algoritinnn (46) 
(adapted from (72l). K refers to tine size of the current 
solution subset while d refers to the size of the final 
solution subset. 

SFFS is renowned for presenting an excellent cost-benefit in 
terms of the computational complexity and the quality of the 
returned solution. There are some variants of this algorithm 
(adaptive and generalized floating search methods) that try 
to improve the SFFS solutions at the expense of an increase 
on the computational cost. However, they can not avoid the 
nesting effect completely l72l . 

4.3 Intrinsically Multivariate Prediction 

This section briefly discusses one of the main reasons why 
the feature selection heuristics do not guarantee the optimum 
solution. A target feature is intrinsically multivariate predicted 
(IMP) by its predictor feature set if all predictors combined 
greatly predicts the target behavior, while every properly 
contained subset of the mentioned predictor set has an almost 
null prediction power regarding the target. Formally, a set of 
features X is intrinsically multivariate predictive for the target 
feature Y with respect to A and 5, for < A, 5 < 1 and A < 5, 
if 



where ^ is a criterion function that varies from to 1 (0 
meaning absence of prediction and 1 meaning full prediction) 
173! . The parameters A and 5 usually assume small (less than 
0.2) and large (greater than 0.8) values, respectively, to reflect 
the IMP property. Additionally, an intrinsically multivariate 
predictiveness degree (IMP score) through the maximum value 
of 5 — A can be defined as: 



/y(X)=^y(X)- 



-max^7(Z) 

ZQX 



(3) 



max^7(Z) < A A^7(X) > 5 

zgx 



(2) 



Considering Boolean features, the deterministic exclusive-or 
binary logic (XOR) is an example of an IMP set when two pre- 
dictors may assume any value from { (0, 0) , (0, 1 ) , ( 1 , 0) , ( 1 , 1 ) } 
with uniform probability distribution. In this case, it is impos- 
sible to predict the target based on the observation of just one 
of its predictors, since the target can assume the value or 1 
regardless of the values of each individual predictor. Of course 
when the predictors are combined, the prediction is perfect for 
every instance from { (0, 0) , (0, 1 ) , ( 1 , 0) , ( 1 , 1 ) } (the prediction 
is given by the XOR logic). 

The nesting effect occurs in most feature selection algo- 
rithms and can be explained by the IMP phenomenon. Consid- 
ering an IMP set, its individual features (or subsets of features) 
are not good to predict the target, so they hardly will be 
included in the partial solution of a given sub-optimal feature 
selection. However, an optimum solution can be formed by 
such features together (large IMP score), which implies that 
the considered feature selection probably does not reach this 
solution. Besides, two good individual predictors may not lead 
to a very good pair, since they have a relatively high correlation 
with the target feature, meaning that they have high correlation 
with each other. Section [6] shows that the method proposed 
here can eventually return IMP sets as solutions when its IMP 
score is moderately large, i.e., each individual feature has little, 
but not null, contribution in predicting the target feature, and 
also when the cardinality of such IMP sets is not very high (in 
this case, any good predictor set with large cardinality would 
not be returned due to the high estimation error performed 
when evaluating large dimensionality sets). 

5 Probabilistic Genetic Networks 

Probabilistic Genetic Networks (PGN) is a model proposed 
to represent GRNs. PGNs are based on PBNs, in which the 
selection of the transition function is not deterministic and the 
states of the genes and networks are represented by discrete 
values. PGNs describe a finite dynamical system, discrete in 
time, composed by a finite number of states, in which each 
transcript is represented by a variable. The composition of 
all variables form a vector considered the system state. Each 
vector component has an associated transition function which 
calculates its next value from the previous state of other genes 
(predictors). These functions are components of a transition 
functions vector, which defines the transition from a network 
state to the next state and represents the gene regulation 
mechanism |45|. 

In this model, the gene expression networks are represented 
as a stochastic process ruled by a Markov chain. In other 



words, assuming this principle means that a conditional prob- 
ability of a future event, given the previous events, depends 
only on the immediately previous event. A Markov chain 
is characterized by a transition matrix 7tY\x c>f conditional 
probabilities among states, and its elements are denoted by 
Py\j^, and a vector of initial states ^o- A PGN is a Markov 
chain {tZj^x^so) in which some axioms are assumed fl31 : 
i the transition matrix TZy^x is homogeneous, i.e., Py\j^ is not 

a function oft. The state transition probability is constant; 
ii Py\x > 0, i.e., all state pairs can be reached (ergodic 

Markov chain); 
iii the transition matrix 71y\x is conditionally independent, 

i.e., for every state pair x,y, Py\j, = lYl=iP{yi\x)'^ 
iv 71y\x is quasi-deterministic, i.e., for every state x, exists a 

state y such that Py\x^ 1. 
Theses axioms are motivated by biological phenomena or 
simplifications due to the lack of samples for model estima- 
tion. The first axiom is a constraint to simplify the estimation 
problem. The second axiom states that all states are reachable, 
i.e., the presence of perturbation or noise can eventually lead 
the system to any state. The third axiom determines that a 
gene expression at a given time instant is independent of the 
expression of other genes at the same instant t. The last axiom 
says that the system has a structural dynamics which is prone 
to small noises (451. 
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Numer of Predictors in the Result Set 



Fig. 2. Criterion function behavior by the inclusion of 
new predictors in the result set (optimum value is zero). 
The black curve (Normal) is expected for targets with 
well defined predictors in which each predictor has a 
good contribution in predicting the target behavior. The 
blue curve (IMP) is expected for targets with intrinsically 
multivariate predictors in which each predictor is not good 
enough to predict the target behavior, as opposite to the 
whole predictor set. The red curve (Source) is expected 
for targets that have no predictors. 



6 SFFS WITH Structural Properties 
(SFFS-BA) 

The proposed method is based on the Probabilistic Genetic 
Networks (PGN), which is described in \A^\ and in Section |5] 
The proposed method considers the four axioms established 
by the PGN model and proposes two new constraints: 

• For every target gene /, by adding a new predictor xt 
in the result set with cardinality > 1, there should be 
an information gain in the prediction of the target gene, 
whenever the chosen predictor is part of the true result 
set. If the information gain by adding a new predictor in 
the result set is poor, the predictor could not be part of 
the true result set or there is no data enough to make the 
prediction. In both cases, the inclusion of the predictor in 
the result set should be avoided. 

• The network topology follows a power-law in its connec- 
tions distribution, i.e., a scale-free network structure such 
as described in Section |2l 

By assuming the PGN model and these new constraints, 
the main contribution of our method is to include structural 
information as a prior knowledge to perform a search on a 
reduced space, thus achieving better results. 

The idea is based on the assumption that a gene with no 
predictors tends to have a random behavior, while a gene with 
predictors tends to have a more ordered behavior. In this way, 
it is possible to expect that a source gene in a GRN (i.e., with 
no predictors) presents a behavior with small variations in the 
criterion function on trying to identify a possible predictor to 
it. In other words, small variations are expected on the criterion 
function values by adding new predictors on the result set of 
a source, as shown by Figure [2] (Source). 



On the other hand, when a gene has predictors, it is expected 
a distinct behavior, mainly on trying a possible predictor that is 
part of the result set (true positive). In this way, it is expected 
some significant variation on criterion function values when 
performing a search for possible predictors, as showed by 
Figure [2] (Normal and IMP). 

The curves shown in Figure |2] were obtained using the 
AGN model described in I.42.I . for which it was applied the 
mean conditional entropy as a criterion function ll46l in order 
to exemplify the adopted assumption for the three discussed 
cases, i.e., normal, source and IMP. 

Given the network topology constraint and the IMP prop- 
erty, the proposed algorithm performs a search for best and 
worst individual predictors, based on the SFFS -MR algorithm 
1 48], by considering all genes of the network. SFFS-MR is 
an algorithm that applies the classical SFFS for several initial 
features, considering good and bad individual features. In the 
second step, the search is performed again for all genes, but 
the algorithm chooses the target genes that present a prediction 
gain (i.e., the criterion function value moves closer to the 
optimum) by adding a new predictor to its result set. This 
increased result set is preserved in the next iteration. The target 
genes with small information gain, i.e., poor predictions when 
increasing its predictor set, are not considered for the next 
iteration, as well as the predictor set that reaches the optimal 
value of the criterion function or gets too close to it. 

In this context, the scale-free topology and the above pre- 
sented assumption are considered by the proposed algorithm 
in order to integrate the prior knowledge (scale-free topology), 
applied to reverse-engineering of GRNs from temporal expres- 
sion profiles. By considering that the scale-free network model 



is characterized by a power law P{k) ^ k~^ in its connections 
distribution (Section [2]), the same power law is considered to 
prune the search space on each iteration. 

Algorithm 1 Network-Inference (targets, exps, 7, A) 
1: var list exelist 

2: var integer k ^ 1, n ^ targets.size^Q 
3: for / = 1 to ^ do 
4: exelist .8ippend(targets[i\, 0, 1, 0) 
5: end for 

while ^ > 1 do 
for / = 1 to ^ do 

[target ^psets^cfv^ gain] ^ ^x^//^^removefirst() 
[new psets^newcfv, gain] ^ SFFS- 

BA(target,cfv,psets,k,exps,A) 
10: Q^xelist 3.ppQ^nd(t ar get, new p sets, new cfv, gain) 

11: end for 

12: SortPredictorSetsbyGain(exelist) 
13: n^nxk~^ 
14: k^k^l 

15: end while 

16: return exelist 



Algorithm 2 SFFS-BA (target, cfv,psets,k,exps,A) 



if psets = then 



for predict or idx = 1 to exps.sizcO do 

psets.3.ppQ^nd(predictoridx) 
end for 
end if 

while psets is not empty and /^^^^i-. first. cardinality < k 
do 
7: new p set ^ /7^^^^.removefirst() 
8: newcfv ^ S¥S(target ,newpset ,k,exps) 
9: if newcfv < bestcfv and (bestcfv — newcfv) > A then 
10: newcfv ^ SBS(target,newpset,exps) 

11: bestcfv ^ newcfv 

12: bestset ^ newpset 

13: end if 

14: if ^^w/?^^^ cardinality = 1 then 
15: /7^^^^.append(^^w;?^^0 

16: end if 
17: end while 
18: if ^ > 1 then 
19: psets ^ bestset 
20: end if 
21: return [psets, best cfv,(cfv — bestcfv)] 

Algorithm [T] starts by considering the targets and all avail- 
able samples (temporal expression profile, called exps) in order 
to select the individual predictors, i.e., ^ = 1. In the following. 
Algorithm |2] (SFFS-BA) is applied in order to discover the 
best features of each target gene, which are ranked according 
to the adopted criterion function. One important difference of 
the proposed feature selection method is that the algorithm 
will explore the search space in steps, i.e., the predictors are 
chosen iteratively according to the cardinality parameter k. 



Another difference is that for ^ = 1 the SFFS-BA algorithm 
will return all predictor sets and the best criterion function 
value in order to explore all the individual predictors in the 
next iteration and to better recover the predictors of the IMP 
targets. From ^ > 1, the algorithm begins to return only the 
best set, assuming that some of the true predictors would be 
in the selected predictors set. 

At the end of each iteration of the Algorithm [TJ the 
target genes are sorted by the prediction gain, the number of 
considered targets for the next iteration is updated following 
a power law, given by n = nxk~^ and the cardinality of the 
result set is updated (k^ k-\-l). 

In this way, when k = 2, the next iterations will consider just 
the target genes with higher prediction gain when increasing 
its predictor cardinality k. It is important to notice that target 
genes that reach the optimal value of criterion function or get 
too close are not considered for the increasing on its predictor 
cardinality. The search is performed while the number of target 
genes n> I (stop condition). 



In summary, the SFFS-BA differs from SFFS (Section [4:2) 
because of its iterativity, the exploration of all combinations 
of predictors set with cardinality k <2 and the inclusion of 
a search space pruning method based on the assumption that 
the expression data (input) were generated from a scale-free 
network. 

Algorithms [T] and [2] present the specification of the proposed 
feature selection algorithm: SFFS-BA. 

The parameter exps represents the temporal expression 
profile, in which the genes are generally arranged in the rows 
and experiments in the columns. The parameter 7 is a constant 
value that determines the exponential decay, i.e., the number of 
targets that will be considered in the next iteration (predictor 
set expansion). A criterion function value variation (A) is also 
included. The A value prevents that minor variations of the 
criterion function (< A) cause the increase of the predictors 
subset. The present paper adopted 7= 2.5 and A = 0.05. The 
adopted 7 is related with the mean value found in the literature, 
which is usually in the range 2 < 7< 3 ISTl . Ii38il . 

The application of the algorithm to predict a single gene 
or a set of genes of interest instead of the entire network is 
straightforward by selecting the targets parameter. 

7 Experimental Results 

This section presents the experimental results obtained by con- 
sidering a synthetic networks approach, which is described in 
pSI. The artificial gene networks (AGNs) were generated by 
considering the uniformly-random Erdos-Renyi (ER) topology, 
the scale-free Barabasi- Albert (BA) and the small- world Watts- 
Strogatz (WS). 

For all experiments, the network models (ER, BA and WS) 
were applied with ^ = 100 vertices (genes). The average degree 
{k) per gene varied from 1 to 5, and the number of observed 
time instants (signal size) varied from 5, 10, 15, 20 to 100 
in steps of 20. For each gene gt of the network, its value 
was given by a randomly selected function from 3 possible 
Boolean functions {/| 5/2 5/3 }, which represents different 
behaviors or functions assumed by each gene gt. In order to 



assign a robust structural dynamics with small noise to the 
networks, the probabilities of each function be selected are 
given by cf = 0.98, cf = 0.01, cf = 0.01, / =1,...,^. 
With these probabilities, the PGN axioms ii (all possible states 
are reachable) and iv (quasi-deterministic setting) are satisfied 
(Section |5]). 

The network identification method described in B6ll im- 
plements feature selection methods for network inference. 
By applying the SFS and SFFS as search strategies and the 
mean conditional entropy as criterion function. This method 
was applied in order to identify the networks from simulated 
temporal expressions. The same method, criterion function 
and other parameters (default) were kept fixed during the 
comparative analysis with SFFS-BA. 

In order to measure the similarity between the synthetic 
(A) and the inferred (B) networks, we adopt the PPV (Pos- 
itive Predictive Value, also known as accuracy or precision) 
and Sensitivity (or recall) fJ4\ . EH, which are widely used 
to compare the results of the GRNs inference methods. 
Since the PPV and Sensitivity are not independent of each 
other, we take into account the geometrical mean between 
them as a similarity measure, given by: Similarity {A ^B) = 
y^PPV{A,B) X Sensitivity {A,B). 

The experimental results were obtained from 50 simulations 
by using different signal sizes (i.e., number of time points) 
and {k) values. The first experiment was performed in order 
to compare the three methods (SFS, SFFS and SFFS-BA) 
with respect to the temporal expressions size, which is a 
critical issue in bioinformatics problems. Figure |3] presents 
these results, in which the similarity measure was calculated 
by taking into account the average results for all variations of 

{k). 

It is possible to notice that all methods have an increase on 
its performance by increasing the number of observations in 
the three network topologies (ER, WS and BA). However, the 
improvement of the SFFS-BA occurs earlier by using just 20 
time points, consistently outperforming the other two meth- 
ods from this point forward. Meanwhile, the SFFS slightly 
outperforms the SFS only with the signal size greater than 80 
time points, i.e., the difference of the similarity rates is smaller 
than the difference achieved by SFFS-BA by considering all 
network topologies. 

In addition, the SFFS-BA similarity curve (Figure [3]) shows 
a more significant improvement with the expansion on signal 
size by considering the BA network topology, as expected. 
However, considering ER and WS network topologies, the 
improvement of the SFFS-BA not only outperforms the other 
two methods but also it is consistent, even in the presence of 
some perturbations in the temporal signal, which is implied 
by the stochasticity in the application of transition functions. 
Figure |4] (a) and (b) shows the boxplots of the similarity values 
for each number of time points. It is possible to notice a very 
small variation in the boxplots, indicating stable results for all 
time points. These results are an important indicative of the 
stability of the proposed methodology. 

The second experiment was performed in order to compare 
the robustness of the methods by increasing the complexity 
of the networks in terms of its average degree {k). Figure 



■■■•-SFFS 
SFFS-BA 




(a)ER 



5 10 15 20 30 40 50 60 70 80 90 100 

Signal Size 




5 10 15 20 30 40 50 60 70 80 90 100 

Signal Size 



■■■•■■■SFFS 
SFFS-BA 






:rr:=^: 



(c)BA 



5 10 15 20 30 40 50 60 70 80 90 100 
Signal Size 

Fig. 3. Similarity measure obtained by SFS, SFFS and 
SFFS-BA applied to infer network edges from temporal 
expression profiles with different number of time points 
(signal size). The similarity represents the mean over 50 
executions for each network topology. 



|5] presents the average results for all variations of signal 
size (number of time points). It is possible to notice the 
similarity down-grade with the increase of average degree (^) 
for the three algorithms. However, there was an improvement 
in results from (/:) = 1 to (/:) = 2 for ER and BA topologies. 
This behavior can be explained by the fact that less complex 
networks (^) = 1 have several genes with no predictor, but the 
inference methods tend to find false positives, thus reducing 
its similarity ratio. 

In this context, the SFFS-BA algorithm also outperforms 
the SFS and SFFS, presenting a soft decrease of similarity 
with the increase of average degree (^) for ER topology. In 
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Fig. 4. Distribution (boxplots) of the similarity values 
obtained by SFS, SFFS and SFFS-BA applied to infer 
network edges from temporal expression profiles with 
different number of time points (signal size) from 50 
executions of each network topology. 



the presence of a network structure as is the case of BA and 
WS topologies, the decrease of similarity was less smooth, but 
even in these cases the SFFS-BA presents better results. 

With regard to the IMP genes with cardinality greater or 
equal than 3, it is important to notice that the SFFS-BA tends 
to consider them if, at each step, an individual predictor added 
to the subset has a prediction gain larger than the predictors of 
the genes considered as sources (absence of predictors). The 
tendency is that a moderately IMP set is detected if the number 
of samples contained in the gene expression matrix is sufficient 
to estimate its joint probability distributions. However, there 
are two situations in which it is possible that SFFS-BA 
considers the target gene from an IMP set as source gene. The 
first situation is the case in which the dimension of the IMP 



■SFS 

■SFFS 

-SFFS-BA 




(a)ER 



2 3 4 

Average Number of Edges 





■■■■^-SFS 

-•-SFFS 
SFFS-BA 
















'-■■■■■■■ * "■^^^^^;::::>^ 








"^^^^^=^^^^^^^^ 















(b) WS 



Average Number of Edges 



■SFS 
■SFFS 
-SFFS-BA 






(c)BA 



Average Number of Edges 

Fig. 5. Similarity measure obtained by SFS, SFFS and 
SFFS-BA applied to infer network edges from different 
network complexities in terms of average degree (k). The 
similarity represents the mean over 50 executions for 
each network topology. 



set is excessive for the number of samples available, making 
the error estimation of the joint probability distributions a 
crucial factor (the number of parameter to be estimated grows 
exponentially as a function of the cardinality). The second 
case refers to the strongly IMP sets where all its properly 
contained subsets offer a very poor information gain with 
regard to the target. This problem is inherent to the feature 
selection methods that explore only a subspace of all possible 
solutions, as SFFS-BA does. The only way to guarantee that 
IMP features are returned is through a exhaustive search for 
the whole solution space. 



8 Conclusion 

This work presents an iterative floating search strategy for the 
inference of gene regulatory networks by including the scale- 
free assumption as a prior information in the inference process. 
Given the known limitations, our focus is the inclusion of prior 
knowledge on search methods. In particular, by presenting 
a more suitable and efficient algorithm for the inference of 
GRNs from temporal expression profiles, which presents a 
small number of samples and huge dimensionalities (genes). 

The proposed algorithm is based on the assumption that 
several biological networks can be approximated by a scale- 
free topology. The presented method exploits this property by 
pruning the search space and using a power law as a weight 
for reducing the search space. In this context, the search space 
traversed by the SFFS-BA method combines a breadth-first 
search when the number of combinations is small ({k) < 2) 
with a depth-first search when the number of combinations 
becomes explosive ((^) > 3). 

The experimental results show that the SFFS-BA provides 
better inference accuracy than SFS and SFFS, when consid- 
ering small signal sizes with 20-30 time points and also with 
large ones, with 100 time points. In addition, the SFFS-BA was 
able to achieve 60% of similarity on network recovery after 
only 50 observations from a state space of size 2^^, presenting 
very good results. 

The SFFS-BA has also proved to be robust and stable, as 
SFS and SFFS, when submitted to the increasing complexity 
of the networks in terms of its average degree (k) . The robust- 
ness ans stability are important properties for the inference 
methods, even in the presence of some perturbations in the 
temporal signal, implied by the stochasticity in the application 
of transition functions. Besides, the SFFS-BA showed better 
results than the SFS and SFFS. 

A possible extension of the present work would be the 
inclusion of the small-world (WS) 1751 topology information 
in order to guide the search process for the correct topology 
inference of these networks. Also, we plan to apply this 
technique to infer GRNs from real data. 
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